<html>
<head>
  <title>Linker</title>
</head>
<body>


<h1>
  Inside the Linker 
</h1>
<div class="doc_author">
  <p>Written by <a href="mailto:kledzik@apple.com">Nick Kledzik</a></p>
</div>


<h2>
  <a name="introduction">Introduction</a>
</h2>

<p>The Darwin linker is a new generation of linker.  It is not "section" based
like traditional linkers which mostly just interlace sections from multiple
object files into the output file.  The Darwin linker is based on "Atoms".
Traditional section based linking work well for simple linking, but their model
makes advanced linking features difficult to implement.  Features like dead code 
stripping, reordering functions for locality, and C++ coalescing require the
linker to work at a finer grain.
</p>

<p>An atom is an indivisible chunk of code or data.  An atom has a set of
attributes, such as: name, scope, content-type, alignment, etc.  An atom also
has a list of Fixups.  A Fixup contains: a kind, an optional offset, an optional
addend, and an optional target atom.</p>

<p>The Atom model allows the linker to use standard graph theory models for 
linking data structures.  Each atom is a node, and each Fixup is an edge. 
The feature of dead code stripping is implemented by following edges to mark
all live atoms, and then delete the non-live atoms.</p>
<br>
<h2>
  <a name="Atom model">Atom model</a>
</h2>

<p>An atom is an indivisible chuck of code or data.  Typically each user
written function or global variable is an atom.  In addition, the compiler may
emit other atoms, such as for literal c-strings or floating point constants, or
for runtime data structures like dwarf unwind info or pointers to initializers.
</p>

<p>A simple "hello world" object file would be modeled like this:</p>
<img src="hello.png" alt="hello world graphic"/>
<p>There are two atoms: main and an anonymous atom containing the c-string
literal "hello world".  The Atom "main" has two fixups.  One is the call site
for the call to printf, and the other is a fixup for the instruction that loads
the address of the c-string literal. </p>

<br>
<h2>
  <a name="File model">File model</a>
</h2>

<p>The linker views the input files as basically containers of Atoms and Fixups,
 and just a few attributes of their own.  The linker works with three kinds
of files: object files, static libraries, and dynamic libraries.  Each kind
of file has reader object which presents the file in the model expected by 
the linker.</p>
<h4> <a>Object File</a> 
</h4>
An object file is just a container of atoms.  When linking with
an object file, all atoms are added to the initial graph of atoms.

<h4> <a>Static Library (Archive)</a> 
</h4>
This is the traditional unix static archive which is just a collection of
object files with a "table of contents". When linking with a static library,
by default nothing is added to the initial graph of atoms. Instead, if there
are unresolved references (dangling edges) in the master graph of all atoms, 
and the table of contents for a static library says that one of the object files 
in the library defines one of the missing symbols (dangling edge), 
the set of atoms from the specified object file in the static library is added 
to the master graph of atoms. 

<h4> <a>Dynamic Library (Shared Object)</a> 
</h4>
Dynamic libraries are unique in that the don't directly add add any atoms.  
Their purpose is to check at build time that all references are resolved and
provide a list of dynamic libraries (SO_NEEDED) that will be needed at runtime. 
The way this is modeled in the linker is that a dynamic library contributes
no atoms to the initial graph of atoms.  Instead, (like static libraries) if
there are unresolved references (dangling edges) in the master graph of all atoms, 
if a dynamic library exports a required symbol, then a "proxy" atom is 
instantiated by the linker.  The proxy atom allows the master atom graph to have
all edges resolved and also records from which dynamic library a symbol came.</p>

<br>
<h2>
  <a name="Linking Steps">Linking Steps</a>
</h2>
<p>Through the use of abstract Atoms, the core of linking is architecture 
independent and file format independent.  All command line parsing is factored
out into a separate "options" abstraction which enables the linker to be driven
with different command line sets.</p>
<p>The overall steps in linking are:<p>
<ol>
  <li>Command line processing</li>
  <li>Parsing input files</li>
  <li>Resolving</li>
  <li>Passes/Optimizations</li>
  <li>Generate output file</li>
</ol>

<p>The Resolving and Passes steps are done purely on the master graph of atoms, 
so they have no notion of file formats such as mach-o or ELF.</p>

<h4> <a>Resolving</a> 
</h4>
<p>The resolving step takes all the atoms graphs from each object file and 
combines them into one master object graph.  Unfortunately, it is not as simple
as appending the atom list from each file into one big list.  There are many
cases where atoms need to be coalesced.  That is, two or more atoms need to 
be coalesced into one atom.  This is necessary to support: C language
 "tentative definitions", C++ weak symbols for templates and inlines defined
in headers, and for merging copies of constants like c-strings and floating
point constants.</p>

<p>The linker support coalescing by-name and by-content. By-name is used for
tentative definitions and weak symbols.  By-content is used for constant data
that can be merged. </p>

<p>When one atom has a reference (FixUp) to another atom, there is also a binding
type: by-name, direct, or indirect. A Fixup contains a tagged union that if
the binding type is by-name, the union field is a pointer to a c-string.  If
the binding type is direct, the union is a pointer to an Atom.  If the binding
type is indirect, the union is a index into a table of pointers to Atoms. Below
is a graphical representation of the binding types:</p>
<img src="bindings.png" alt="binding types graphic"/>

<p>Input file Atoms contain only direct and by-name references.  Direct 
references are used for atoms defined in the same object file for which the 
target atom is either unnamed or cannot change.  For instance, calling 
a static function in a translation unit will result in a direct reference 
to the static functions's atom.  Also the FDE (dwarf unwind info) for a function
has a direct reference to its function.  On the other hand references to 
global symbols (e.g. call to printf) use by-name binding in object files.
</p>

<p>The resolving process maintains some global linking "state", including:
a "symbol table" which is a map from c-string to Atom*, an indirect symbol
table which is a growable array of Atom*, and for each kind of coalesable
constants there is a content to Atom* map.  With these data structures,
the linker walks all atoms in all input files. For each
atom, it checks if the atom should be in one symbol table or one of the 
coalescing tables.  If so, it attempts to add the atom.  If there already is
a matching atom in that table, that means the current atom needs to be 
coalesced with the found atom.  
</p>

<p>To support coalescing, all references to coalesable atoms are changed to
indirect binding and an entry is added to the indirect table which points
to the current chosen atom.  When all input atoms have been processed by
the resolver, there should be only direct and indirect bindings left.  If
there are any NULL entries in the indirect table, that means there are  
undefined references.  The linker then looks to the supplied libraries (both
static and dynamic) to resolve those references.  
</p>

<p>Dead code stripping (if requested) is done at the end of resolving.  The
linker does a simple mark-and-sweep. It starts with "root" atoms (like "main"
in a main executable) and follows each references and marks each Atom that
it visits as "live".  When done, all atoms not marked "live" are removed.
</p>

<h4> <a>Passes</a> 
</h4>
<p>The Passes step
is an open ended set of routines that each get a change to modify or enhance
the master graph of atoms. Passes are only run if the master graph of 
atoms is completely resolved (no dangling edges). 
The current set of Passes in the Darwin linker are:</p>
<ul>
  <li>Objective-C optimizations (Apple)</li>
  <li>stub (PLT) generation</li>
  <li>GOT instantiation</li>
  <li>TLV instantiation (Apple)</li>
  <li>order_file optimization</li>
  <li>branch island generation</li>
  <li>branch shim generation</li>
  <li>dtrace probe processing (Apple)</li>
  <li>compact unwind encoding (Apple)</li>
</ul>
<p>Some of these passes are specific to Apple's runtime environments.  But many
of the passes are applicable to any OS (such as generating branch island for 
out of range branch instructions).</p>

<p>The general structure of a pass is to walk the master graph inspecting each
atom and doing something.  For instance, the stub pass, walks the graph looking
for atoms with call sites to proxy atoms (e.g. call to printf).  It then
instantiates a "stub" atom (PLT entry) and a "lazy pointer" atom for each 
proxy atom needed, and these new atoms are added to the master graph.  Next
all the noted call sites to proxy atoms are replaced with calls to the 
corresponding stub atom.</p>  

<h4><a>Generate Output File</a> 
</h4>
<p>Once the passes are done, the output file generator is given a sorted list
of atoms.  Its job is to create the executable content file wrapper and place
the content of the atoms into it. 
</p>


<h2>
  <a name="Future Directions">Future Directions</a>
</h2>

<h4><a>Sections</a> 
</h4>
<p>The current use of sections in mach-o .o files over-constrains the linker.
By default, the linker should preserve the section an atom is in.  But since
all sections must be contiguous in the output, that limits the ability of
the linker to order atoms for locality.  It would be helpful to enrich the
object file with with reason something is in the section it is.  For instance,
is the section found at runtime? Or was the use of a section just a quick
way to group some content together?
</p>
<p>The ELF model for sections is a little better than mach-o because ELF
sections have write and execute bits, whereas mach-o sections must be in some
segment and the segment has the write and execute bits.  
</p>

<h4><a>Mach-o Object File Format</a> 
</h4>
<p>
The messiest part of the linker is the mach-o parser. This is because mach-o
is a traditional section and symbols based file format.  The parser must infer
atom boundaries using two approaches.  The first is that some section types have  
well defined content which the linker can parse into atoms (e.g.  __cstring, 
__eh_frame). The other approach is a naming convention (which the compiler follows)
by which the linker breaks sections into atoms at any non-local (not starting 
with 'L') symbol. The processing the linker has to do parse mach-o .o files is a
significant part of the link time. 
</p>

<p>Given that the assembler writes object files once, whereas the linker reads
them many times (during development), it would make sense to optimize the object
file format to be something the linker can read/parse efficiently.</p>  

<h4><a>New Object File Model</a> 
</h4>
<p>LLVM has a nice model for its IR.  There are three representations:
the binary bit code file, the in-memory object model, and a textual 
representation.  LLVM contains utility possible code for converting between these
representations.  The same model makes sense for atoms too.  There should be
three representations for atoms: binary file, in-memory, and textual. The Darwin 
linker already has an in-memory C++ object model for Atoms.  All we need is a 
textual representation and binary file format.
</p>
<p>Note: in the darwin linker the binary format for input object files is  
independent of the output executable format.  That is, we could have one 
universal object file format which the linker could use as input to produce 
mach-o, ELF, or PE executables.</p>
<p>
The object file binary format should be designed to instantiate into atoms
as fast as possible.  The obvious way to do that is that the 
file format would be an array of atoms.  The linker just mmaps in the file and
looks at the header to see how many atoms there and instantiate that many atoms
with the atom attribute information coming from that array.  The trick is 
designing this in a way that can be extended as the Atom mode evolves and new
attributes are added.
</p>
<p>
In designing a textual format we want something easy for humans to read and
easy for the linker to parse.  Since an atom has lots of attributes most of
which are usually just the default, we should define default values for 
every attribute so that those can be omitted from the text representation.
One possile format is YAML.  Here is the atoms for a simple hello world
program expressed in YAML.
</p>
<pre>
---
target-triple:   x86_64-apple-darwin11
source:

atoms:
    - name:    _main
      scope:   linkage-unit
      type:    code
      alignment: 
          power: 4
      content: [ 55, 48, 89, e5, 48, 8d, 3d, 00, 00, 00, 00, 30, c0, e8, 00, 00,
                 00, 00, 31, c0, 5d, c3 ]
      fixups:
      - offset: 07
        kind:   pcrel32
        target: 2
      - offset: 0E
        kind:   call32
        target: _fprintf

    - type:    c-string
      merge:   by-content
      content: [ 73, 5A, 00 ]

...
</pre>

<p>One big use for the textual format will be writing test cases. The Darwin
linker test suite test cases are written mostly in C/C++ and a few assembly
files.  The use of C means the same test case can be compiled for different
architectures.  But writing test cases in C is problematic because the compiler 
may vary its output over time for its own optimization reasons which my 
inadvertently disable or break the linker feature trying to be tested. By 
writing test cases in the linkers own textual format, we can exactly specify 
every attribute of every atom and thus target specific linker logic.
</p>

<h4><a>Debug Info</a> 
</h4>
<p>Around 2005 when Apple switched from using STABS to using DWARF for debug 
information, we made a design decision to have the linker ignore DWARF in
.o files.  This improves linking performance because the linker is not
copying tons of debug info.  Instead, the linker adds "debug notes" into
output binary that contain the paths of the original .o files. During development
the Darwin debugger will notice the debug notes and the load the dwarf
debug information from the original object files.  For release builds,
a tool named dsymutil is run on the program.  It finds the debug notes and
then the original object files, then reads, merges and optimizes all the dwarf
debug information into one .dSYM file which can be loaded by the debugger
if needed.</p>

<p>The current way DWARF is generated is that all debug information for all
functions in a translation unit are merged and optimized into sections based 
on debug info kind.  For instance the mapping of instructions to source line
numbers for all functions is compressed and put in one section. This does not
play well in an Atom based file format.  One idea is to have the compiler
emit some intermediate representation debug information (one which is 
partitioned per atom) into the Atom based file format.  The linker could 
then have code to convert that intermediate debug into to final dwarf.
This is still an open question.</p>

<h4><a>Extending Atom attributes to ELF and XCOFF</a> 
</h4>
<p>The current set of attributes defined for Atoms in the darwin linker
were chosen to meet the requirements of developing code to run on iOS and 
Mac OS X.  Below is a list of the attributes and their possible values.
It may just require adding more values to support ELF and XCOFF.  Or there
may need to be new attributes added to capture new functionality.
</p>
<ul>
  <li>Name</li>
  <li>Size</li>
  <li>Section (I'd like to get rid of this)</li>
  <li>ContentType (currently some of this comes from section)</li>
  <ul>
	  <li>code</li>
	  <li>stub</li>
	  <li>data</li>
	  <li>zeroFill</li>
	  <li>initializerPointer</li>
	  <li>objc1Class</li>
	  <li>objc2Class</li>
	  <li>objcClassPointer</li>
	  <li>objc2CategoryList</li>
	  <li>non-lazy-pointer</li>
	  <li>lazy-pointer</li>
	  <li>constant</li>
	  <li>literal4</li>
	  <li>literal8</li>
	  <li>literal16</li>
	  <li>cstring</li>
	  <li>cstringPointer</li>
	  <li>utf16string</li>
	  <li>CFString</li>
	  <li>CFI</li>
	  <li>LSDA</li>
	  </ul>
  </li>
  <li>Scope
  <ul>
	  <li>translationUnit  (static functions)</li>
	  <li>linkageUnit      (visibility hidden)</li>
	  <li>global</li>
	  </ul>
  </li>
  <li>DefinitionKind
  <ul>
	  <li>regular</li>
	  <li>tentative         (ANSI C feature)</li>
	  <li>absolute          (assembly code feature)</li>
	  <li>proxy             (stand-in for dynamic library symbol)</li>
  </ul>
  </li>
  <li>Combine
  <ul>
	  <li>never</li>
	  <li>byName          (weak symbols)</li>
	  <li>byContent       (simple constants)</li>
	  <li>byContentAndReferences (complex constants)</li>
  </ul>
  </li>
  <li>SymbolTableStatus
  <ul>
	  <li>In</li>
	  <li>notIn              (anonymous)</li>
	  <li>inAsAbsolute       (assembly code feature)</li>
	  <li>inAndNeverStrip    (tell strip tool to leave)</li>
	  <li>inWithRandomName   (mach-o .o feature)</li>
  </ul>
  <li>Alignment
  <ul>
	  <li>powerOfTwo</li>
	  <li>modulus</li>
  </ul>
  <li>NeverDeadStrip (boolean)</li>
  <li>IsThumb (ARM specific)</li>
</ul>
<p>Where does dllexport fit in here?  Where does visibility protected and 
internal fit?  Protected seems like scope=global plus the rule to not 
indirect references to it.  Internal is like hidden plus enables some
compiler optimizations.  I'm not sure the linker needs to know about internal.
</p>

</body>
</html>

